A new look at tree models for multiple sequence alignment
نویسنده
چکیده
Evolutionary trees are frequently used as the underlying model in the design of algorithms, optimization criteria and software packages for multiple sequence alignment (MSA). In this paper , we reexamine the suitability of trees as a universal model for MSA in light of the broad range of biological questions that MSA's are used to address. A tree model consists of a tree topology and a model of accepted mutations along the branches. After surveying the major applications of MSA, examples from the molecular biology literature are used to illustrate situations in which this tree model fails. This occurs when the relationship between residues in a column cannot be described by a tree; for example, in some structural and functional applications of MSA. It also occurs in situations, such as lateral gene transfer, where an entire gene cannot be modeled by a unique tree. In cases of nonparsimonous data or convergent evolution, it may be diicult to nd a consistent mutational model. We hope that this survey will promote dialogue between biologists and computer scientists, leading to more biologically realistic research on MSA.
منابع مشابه
An Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملProbabilistic Phylogenetic Inference with Insertions and Deletions
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for in...
متن کاملLarge-Scale Multiple Sequence Alignment and Phylogeny Estimation
With the advent of next generation sequencing technologies, alignment and phylogeny estimation of datasets with thousands of sequences is being attempted. To address these challenges, new algorithmic approaches have been developed that have been able to provide substantial improvements over standard methods. This paper focuses on new approaches for ultra-large tree estimation, including methods...
متن کاملSimultaneous Sequence Alignment and Tree Construction Using Hidden Markov Models
We present a new algorithm (SATCHMO) that simultaneously estimates a tree and generates a set of multiple sequence alignments given a set of protein sequences. Alignments are constructed for each node in the tree. These alignments predict the structurally conserved elements of the sequences in a subtree and are therefore of different lengths, and represent different amino acid preferences, at d...
متن کامل